{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Fuzzy-String-Matching\n",
    "\n",
    "## [Link to my Youtube Video Explaining this whole Notebook](https://www.youtube.com/watch?v=_IGdekeBCoE&list=PLxqBkZuBynVTn2lkHNAcw6lgm1MD5QiMK&index=7)\n",
    "\n",
    "[![Imgur](https://imgur.com/4RQ499n.png)](https://www.youtube.com/watch?v=_IGdekeBCoE&list=PLxqBkZuBynVTn2lkHNAcw6lgm1MD5QiMK&index=7)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from thefuzz import fuzz as fz"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 2.18 s, sys: 4.14 ms, total: 2.18 s\n",
      "Wall time: 2.19 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "for i in range(1, 100000):\n",
    "    fz.ratio('A Large Cap Stock', 'A large Cap Stock')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "import rapidfuzz"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 21 ms, sys: 0 ns, total: 21 ms\n",
      "Wall time: 20.8 ms\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "for i in range(1, 100000):\n",
    "    rapidfuzz.fuzz.ratio('A Large Cap Stock', 'A large Cap Stock')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "import rapidfuzz.fuzz as fuzz"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Partial Ratio"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "100.0"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Str1 = \"My name is Rohan\"\n",
    "Str2 = \"My name is Rohan Paul\"\n",
    "\n",
    "fuzz.partial_ratio(Str1, Str2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Token Set Ratio"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "100.0"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Str3 = \"Hi Long time no see\"\n",
    "Str4 = \"Hi After a long time seeing you\"\n",
    "\n",
    "fuzz.token_set_ratio(Str1, Str2)\n",
    "\n",
    "# fuzz.token_set_ratio(Str3, Str4)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### W Ratio: \n",
    "\n",
    "Calculates a weighted ratio based on the other ratio algorithms. It depends on the number of times a word occurs, order of the tokens, etc."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "83.87096774193549"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Str5 = \"Hi, ar you alright \"\n",
    "\n",
    "Str6 = \"Hi you alight\"\n",
    "\n",
    "fuzz.WRatio(Str5, Str6)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Q Ratio: \n",
    "\n",
    "It is very similar to fuzzy.ratio except this pre-processes the string before calculating the distance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "83.87096774193549"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Str5 = \"Hi, ar you alright \"\n",
    "Str6 = \"Hi you alight\"\n",
    "\n",
    "fuzz.QRatio(Str5, Str6)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## String Metric Module:\n",
    "This module is responsible for the type of various edit distance.\n",
    "\n",
    "### Jaro Similarity: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "87.47638326585695"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Str5 = \"Hi, ar you alright \"\n",
    "\n",
    "Str6 = \"Hi you alight\"\n",
    "\n",
    "rapidfuzz.string_metric.jaro_winkler_similarity(Str5, Str6)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Hamming Distance: It describes the minimum amount of substitutions required to transform s1 into s2. For hamming distance comparison, two strings need to be of the same length."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Str5 = \"Hi are alright\"\n",
    "\n",
    "Str6 = \"Hi you alright\"\n",
    "\n",
    "rapidfuzz.string_metric.hamming(Str5, Str6)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from cgi import test\n",
    "from hashlib import algorithms_available\n",
    "\n",
    "\n",
    "just some test things\n",
    "\n",
    "another test and algorithms_available"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.9.13 64-bit",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.13"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "f9f85f796d01129d0dd105a088854619f454435301f6ffec2fea96ecbd9be4ac"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
